AITopics | provably efficient rl

Collaborating Authors

provably efficient rl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces

Roknilamouki, Amirhossein, Ghosh, Arnob, Shi, Ming, Nourzad, Fatemeh, Ekici, Eylem, Shroff, Ness B.

arXiv.org Artificial IntelligenceFeb-25-2025

In Reinforcement Learning (RL), tasks with instantaneous hard constraints present significant challenges, particularly when the decision space is non-convex or non-star-convex. This issue is especially relevant in domains like autonomous vehicles and robotics, where constraints such as collision avoidance often take a non-convex form. In this paper, we establish a regret bound of $\tilde{\mathcal{O}}\bigl(\bigl(1 + \tfrac{1}{\tau}\bigr) \sqrt{\log(\tfrac{1}{\tau}) d^3 H^4 K} \bigr)$, applicable to both star-convex and non-star-convex cases, where $d$ is the feature dimension, $H$ the episode length, $K$ the number of episodes, and $\tau$ the safety threshold. Moreover, the violation of safety constraints is zero with high probability throughout the learning process. A key technical challenge in these settings is bounding the covering number of the value-function class, which is essential for achieving value-aware uniform concentration in model-free function approximation. For the star-convex setting, we develop a novel technique called Objective Constraint-Decomposition (OCD) to properly bound the covering number. This result also resolves an error in a previous work on constrained RL. In non-star-convex scenarios, where the covering number can become infinitely large, we propose a two-phase algorithm, Non-Convex Safe Least Squares Value Iteration (NCS-LSVI), which first reduces uncertainty about the safe set by playing a known safe policy. After that, it carefully balances exploration and exploitation to achieve the regret bound. Finally, numerical simulations on an autonomous driving scenario demonstrate the effectiveness of NCS-LSVI.

instantaneous safety constraint, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2502.18655

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (0.45)
Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (0.34)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

Kitamura, Toshinori, Ghosh, Arnob, Kozuno, Tadashi, Kumagai, Wataru, Kasaura, Kazumi, Hoshino, Kenta, Hosoe, Yohei, Matsuo, Yutaka

arXiv.org Artificial IntelligenceFeb-17-2025

We study the reinforcement learning (RL) problem in a constrained Markov decision process (CMDP), where an agent explores the environment to maximize the expected cumulative reward while satisfying a single constraint on the expected total utility value in every episode. While this problem is well understood in the tabular setting, theoretical results for function approximation remain scarce. This paper closes the gap by proposing an RL algorithm for linear CMDPs that achieves $\tilde{\mathcal{O}}(\sqrt{K})$ regret with an episode-wise zero-violation guarantee. Furthermore, our method is computationally efficient, scaling polynomially with problem-dependent parameters while remaining independent of the state space size. Our results significantly improve upon recent linear CMDP algorithms, which either violate the constraint or incur exponential computational costs.

linear function approximation, machine learning, reinforcement learning, (5 more...)

arXiv.org Artificial Intelligence

2502.10138

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Provably efficient RL with Rich Observations via Latent State Decoding

Du, Simon S., Krishnamurthy, Akshay, Jiang, Nan, Agarwal, Alekh, Dudík, Miroslav, Langford, John

arXiv.org Machine LearningJan-25-2019

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps---where previously decoded latent states provide labels for later regression problems---and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with na\"ive exploration, even when $Q$-learning has cheating access to latent states.

probability, provably efficient rl, rich observation, (14 more...)

arXiv.org Machine Learning

1901.09018

Country:

North America > United States > New York (0.04)
North America > United States > Illinois (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(2 more...)

Add feedback